Legislative Prediction via Random Walks over a Heterogeneous Graph
نویسندگان
چکیده
In this article, we propose a random walk-based model to predict legislators’ votes on a set of bills. In particular, we first convert roll call data, i.e. the recorded votes and the corresponding deliberative bodies, to a heterogeneous graph, where both the legislators and bills are treated as vertices. Three types of weighted edges are then computed accordingly, representing legislators’ social and political relations, bills’ semantic similarity, and legislator-bill vote relations. Through performing two-stage random walks over this heterogeneous graph, we can estimate legislative votes on past and future bills. We apply this proposed method on real legislative roll call data of the United States Congress and compare to state-of-the-art approaches. The experimental results demonstrate the superior performance and unique prediction power of the proposed model. 1 Background and Motivation Humanistic and social studies, including anthropology, criminology, marketing, sociology, and urban planning are increasingly turning to data-driven quantitative methods, informatics, and predictive analytics. Political science is no different. Politics in democracies are centered around votes on bills in legislatures. Voting history, also known as roll call data, is an important historical record that has been studied statistically since the 1920s, if not earlier [27]. Following other political science studies, we focus on the legislature of the federal government of the United States of America, known as the Congress. An important feature of the United States Congress is that legislators are not bound to vote in lockstep with their party. In contrast to parliamentary governments, such as those that follow the Westminster system, party affiliation is not codified in the constitution and thus is only one of many factors that go into determining whether a legislator votes yea or nay. Congress is a bicameral legislature composed of the Senate with 100 members known as senators, and the House of Representatives with 435 members known as representatives. A session of Congress lasts two years, with the current session being the one hundred twelfth. The composition of Congress changes after every session due to elections. Within a session, the only changes are due to death or resignation. A bill is a proposed law under consideration by a legislature, that if passed, becomes a law. There are approximately 700 bills voted upon per session in the Senate and approximately 1200 in the House of Representatives. Each bill that comes to a vote in Congress is sponsored by at least one legislator. Other legislators may cosponsor the bill if they coauthored it or if they wish to publicly indicate strong support for it in advance of the vote. Thus, frequent cosponsorship of bills reflects collaboration and similarity in ideology between legislators. Roll call data can be analyzed to obtain a variety of descriptive statistics, but can also be used in developing predictive models. Legislative prediction leads to a better understanding of government and can also provide actionable insights to political strategists. It is a challenging task to predict the votes of all current legislators on a bill that has not yet been voted upon. One of the representative models in quantitative political science is the ideal point model (IPM), which builds a one-dimensional “political space” and then places each legislator and bill in that space [7]. Realizing the limitations of IPM, such as the low-dimension restriction, researchers from the machine learning and data mining communities have recently proposed some advanced methods, including the ideal point topic model [13], a joint model from the temporal perspective [35], and a multiple kernel learning model [29]. In this article, we propose to leverage both text mining of bills and the social connection between legislators to predict legislative votes. In particular, we develop a novel model based on random walks on a heterogeneous 1In this paper, we ignore non-voting delegates in the House of Representatives from territories such as Guam that are not states. 786 Copyright © SIAM. Unauthorized reproduction of this article is prohibited. 1095 graph (RWHG) to predict the vote links between legislators and bills. In this formulation, the roll call data is represented as a heterogeneous graph, where both legislators and bills are treated as vertices. The legislators are connected based on political relationship, specifically cosponsorship, and the bills are connected based on their semantic similarity in the bag-of-words representation space. The votes, yea or nay, are treated as directed edges of a bipartite-style legislator-bill graph (refer to Figure 1). Based on this formulation, a twostage random walk is performed over the heterogeneous graph to iteratively generate vote links. Experimental results on predicting random missing votes and sequentially predicting future votes shows the superior performance of this method over state-of-the-art algorithms. In the remaining part of this paper, we describe the heterogeneous graph formulation in Section 2. In Section 3, we present the RWHG model to predict vote links. The experimental results on real roll call data from the United States Congress are described in Section 4. We describe our main contributions, related work, conclusions, and directions for future work in Section 5. 2 Heterogeneous Graph Formulation of Roll Call Data In this section, we use a heterogeneous graph to represent the roll call data, where both the legislators and bills are treated as graph vertices. The legislator vertices are connected based on their social and political relationships, quantified with edge weights. Similarly, the bills are connected based on their estimated semantic similarity. The votes are treated as the links of a bipartite-style legislator-bill graph. Overall, this unique formulation has a heterogeneous-structured graph with two types of vertices and three types of edges. 2.1 Graph Notations We first define the graph notation for the legislators. Assume there are a total of L legislators and denote the set of legislator vertices as V(x) = {x1, . . . , xl, . . . , xL} with cardinality |V(x)| = L. These legislators can be connected based on attributes such as party, state, age, gender, and cosponsorship by converting the attributes to a political similarity measure between legislators. In other words, the legislators form a graph G(x) = {V(x),E(x)} independently, with an edge set E(x) = {e(x)lm} ⊂ V(x)× V(x) (l,m = 1, . . . , L). The details for estimating political similarity, i.e. the weight of the edges, will be provided in the following subsection. In addition, we define the set of bills as V(y) = {y1, . . . , yn, . . . , yN} with cardinality |V(y)| = N . Given textual content, we reuse the same symbol to represent the standard bag-of-words (BOW) model of bills as yn ∈ R , where B is the size of the dictionary [14]. Accordingly, the bills form a graph in the semantic space, where the set of vertices V(y) represents the bills and the set of edges E(y) = {e(y)nk} ⊂ V(y)×V(y) (n, k = 1, . . . , N) connects bills based on their semantic similarity. Therefore, we now have the bill graph represented as G(y) = {V(y),E(y)}. The last piece of information we want to leverage into the graph formulation is the initially-given set of votes, i.e. the yea or nay results for the legislators voting on the bills. Since each vote involves two types of vertices, one legislator and one bill, the vote can be viewed as a special type of directed edge or link across these heterogeneous vertices. This gives the third component of the heterogeneous graph formulation, a bipartite structured vote graph G(xy) = {V,E(xy)}, where V = V(x) ∪ V(y) and E(xy) = {e(xy)ln} ⊂ V(x) × V(y) (l = 1, . . . , L, n = 1, . . . , N). In summary, the heterogeneous graph G contains three subgraphs: legislator graph G(x), bill graph G(y), and vote graph G(xy). In a general form, we can write G as G = {V,E}, (2.1) V = V(x) ∪V(y), E = E(x) ∪E(y) ∪E(xy), E(x) ⊂ V(x) ×V(x), E(y) ⊂ V(y) ×V(y), E(xy) ⊂ V(x) ×V(y). In other words, graph G has two types of heterogeneous vertices, i.e. legislators V(x) and bills V(y), and three types of edges, legislator political relations E(x), bill semantic similarity E(y), and directed vote links E(xy). In the following subsections, we will detail the estimation of these edge weights and provide some important graph quantities. 2.2 Legislators’ Social and Political Relations Social connections among the members of the House and Senate have been well-studied in fields like social science and political science because they illuminate information for estimating political relevance and revealing the underlying legislative patterns [11]. Different kinds of social connections, such as friendship, family, and acquaintanceship relations, have been identified as important effects on political positions [3]. However, predicting roll call data is about understanding legislators’ ideology more than social relationships between them [11, 26]. Therefore, scholars recently proposed to use cosponsorship relations as a more robust and di787 Copyright © SIAM. Unauthorized reproduction of this article is prohibited. 1096
منابع مشابه
NetGAN: Generating Graphs via Random Walks
We propose NetGAN – the first implicit generative model for graphs able to mimic real-world networks. We pose the problem of graph generation as learning the distribution of biased random walks over the input graph. The proposed model is based on a stochastic neural network that generates discrete output samples and is trained using the Wasserstein GAN objective. NetGAN is able to produce graph...
متن کاملFaster Clustering via Non-Backtracking Random Walks
This paper presents VEC-NBT, a variation on the unsupervised graph clustering technique VEC, which improves upon the performance of the original algorithm significantly for sparse graphs. VEC employs a novel application of the state-ofthe-art word2vec model to embed a graph in Euclidean space via random walks on the nodes of the graph. In VEC-NBT, we modify the original algorithm to use a non-b...
متن کاملHalting in Random Walk Kernels
Random walk kernels measure graph similarity by counting matching walks in two graphs. In their most popular form of geometric random walk kernels, longer walks of length k are downweighted by a factor of λ (λ < 1) to ensure convergence of the corresponding geometric series. We know from the field of link prediction that this downweighting often leads to a phenomenon referred to as halting: Lon...
متن کاملWatch Your Step: Learning Graph Embeddings Through Attention
Graph embedding methods represent nodes in a continuous vector space, preserving information from the graph (e.g. by sampling random walks). There are many hyper-parameters to these methods (such as random walk length) which have to be manually tuned for every graph. In this paper, we replace random walk hyperparameters with trainable parameters that we automatically learn via backpropagation. ...
متن کاملMean field conditions for coalescing random walks
The main results in this paper are about the full coalescence time C of a system of coalescing random walks over a finite graph G. Letting m(G) denote the mean meeting time of two such walkers, we give sufficient conditions under which E [C] ≈ 2m(G) and C/m(G) has approximatelly the same law as in the “mean field” setting of a large complete graph. One of our theorems is that mean field behavio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012